The Use of Autologous Chondrocyte and Mesenchymal Stem Cell Implants for the Treatment of Focal Chondral Defects in Human Knee Joints—A Systematic Review and Meta-Analysis

Focal chondral defects of the knee occur commonly in the young, active population due to trauma. Damage can insidiously spread and lead to osteoarthritis with significant functional and socioeconomic consequences. Implants consisting of autologous chondrocytes or mesenchymal stem cells (MSC) seeded onto scaffolds have been suggested as promising therapies to restore these defects. However, the degree of integration between the implant and native cartilage still requires optimization. A PRISMA systematic review and meta-analysis was conducted using five databases (PubMed, MEDLINE, EMBASE, Web of Science, CINAHL) to identify studies that used autologous chondrocyte implants (ACI) or MSC implant therapies to repair chondral defects of the tibiofemoral joint. Data on the integration of the implant-cartilage interface, as well as outcomes of clinical scoring systems, were extracted. Most eligible studies investigated the use of ACI only. Our meta-analysis showed that, across a total of 200 patients, 64% (95% CI (51%, 75%)) achieved complete integration with native cartilage. In addition, a pooled improvement in the mean MOCART integration score was observed during post-operative follow-up (standardized mean difference: 1.16; 95% CI (0.07, 2.24), p = 0.04). All studies showed an improvement in the clinical scores. The use of a collagen-based scaffold was associated with better integration and clinical outcomes. This review demonstrated that cell-seeded scaffolds can achieve good quality integration in most patients, which improves over time and is associated with clinical improvements. A greater number of studies comparing these techniques to traditional cartilage repair methods, with more inclusion of MSC-seeded scaffolds, should allow for a standardized approach to cartilage regeneration to develop.


Introduction
Globally, osteoarthritis of the knee is a leading cause of disability, exacerbated by an aging population and the rising prevalence of obesity [1]. The disease incurs substantial economic costs both to healthcare systems and to individual patients [2]. Its progressive nature frequently necessitates joint replacements. A therapy that avoids knee replacement would therefore save significant costs [3,4]. Conservative measures aimed at preventing the need for surgery show mixed results. Exercise improves pain and function, but this is short-lived [5]. Intra-articular analgesics, corticosteroids, hyaluronic acid, autologous serum and platelet rich plasma cannot mitigate the progressive inflammatory process involved [6].
Traumatic injury to the articular cartilage risks the development of knee osteoarthritis at a younger age [7]. Although established osteoarthritis is diffuse, the disease may develop 1.
Study characteristics such as study design, cohort size and time of follow-up.

2.
Demographic information such as mean age, sex distribution, defect location and defect size. 3.
Type of intervention, including type of ACI/MSC and scaffold composition. 4.
Molecular status of the cells used in each intervention, including cluster of differentiation (CD) molecule profile and molecular construct of the scaffolds.

5.
Primary outcome measures regarding integration between the implant and native cartilage, assessment method (MRI, arthroscopy or histology) and scoring system used to quantify integration. 6.
Secondary outcomes, including clinical scores and any surgical complications.

Data Analysis
Regarding integration between the implant and native cartilage, outcomes from comparable MRI-based scoring systems or the proportion of patients achieving complete integration, as seen on MRI, were pooled in meta-analyses.
For studies evaluating integration using MRI, the "integration to the border zone" component was extracted from scoring systems such as the Magnetic Resonance Observation of Cartilage Repair Tissue (MOCART) score, which post-operatively assesses cartilage repair in comparison to adjacent hyaline cartilage [30,31]. The results were sometimes presented as the proportion of patients achieving a predefined MOCART border integration score: (1) complete integration with adjacent cartilage; (2) incomplete integration (split like border visible); (3) a visible demarcating border of <50% of the length of the repair tissue; or (4) a visible defect >50% of the length of the repair tissue. Other studies used the following scoring system: 1 = poor integration, 2 = fair, 3 = good, 4 = excellent, and a mean MOCART integration score was reported. The results were extracted for studies that used their own MRI composite scores, which were deemed comparable with MOCART and included in the analysis. As the MOCART scoring system is commonly used to evaluate the overall quality of the implant [30], other pertinent components were extracted such as the degree of defect filling, how intact the surface of the implant was, the homogeneity of the implant (described as its structure) and the status of the subchondral lamina and bone.
For studies that evaluated integration using second-look arthroscopy or biopsy, either the macroscopic International Cartilage Repair Society (ICRS) or histological ICRS score results were extracted.
Clinical outcomes from six scoring systems were pooled for meta-analysis: (1) the International Knee Documentation Committee (IKDC) score, a patient-reported self-evaluation on whether they can complete certain tasks; (2) the Knee Injury and Osteoarthritis Outcome (KOOS) score, a percentage score evaluating short-and long-term symptoms; (3) the Lysholm score, a score out of 100 examining the presence of specific knee symptoms such as locking and instability; (4) the Tegner Knee Activity Scale (TAS), grading the amount of work and sporting activities possible; (5) the Visual Analogue Scale (VAS), which measures pain intensity; (6) the Short Form-36 Physical and Mental scores (SF-36), short surveys regarding physical and mental health.
Meta-analyses were carried out using RStudio version 4.0.5. For continuous data, the Wan et al. estimator was used where the mean ± standard deviation was not given in the manuscript [32]. Higgins and Thompson's I 2 statistic and Cochran's Q test were used as measures of heterogeneity [33,34]. Prediction intervals were also included to provide a range into which future studies' effect sizes can be expected to fall. Subgroup analyses were performed according to whether or not collagen was a component of the implanted scaffold.

Assessing Risk of Bias
Risk of bias assessments were carried out independently by EL and IEE, and VL was consulted for unresolvable disagreements.
The Cochrane RoB 2.0 tool was used to assess randomized trials according to its five domains [35]: (1) bias from the randomization process; (2) bias due to deviations from the intended interventions; (3) missing outcome data; (4) bias in measurement of the outcome; (5) bias in selection of the reported result. These domains were each assessed as having a low risk, some concerns or high risk of bias, and an overall risk was determined.
The Cochrane ROBINS-I tool was used to assess non-randomized trials according to its seven domains [36]: (1) bias due to confounding variables; (2) bias in the selection of participants into the study; (3) bias in the classification of interventions; (4) bias due to deviations from intended interventions; (5) bias due to missing data; (6) bias in the measurement of outcomes; (7) bias in the selection of the reported result. These domains were assessed as having a low, moderate, serious or critical risk of bias, and an overall risk was determined.
The results of the assessments were presented using the robvis package [37] in RStudio.

Search Results
A total of 963 papers were identified after the initial search on five databases ( Figure 1). Following deduplication, 883 papers remained for title and abstract screening. A total of 124 full texts were assessed for eligibility, from which 17 studies were eligible for data synthesis.

Characteristics of Selected Studies
Patient demographics, study designs and interventions are displayed in Table 2. Seven included studies were RCTs, and the rest were prospective case series. Two pairs of studies reported results of the same trial at different follow-up time points [38][39][40][41]. All studies investigated the performance of ACI. One of these was a randomized trial comparing MACI with synovium-derived MSCs [42]. Three studies combined the implantation of ACIs with a bone graft for the treatment of osteochondral defects [43][44][45]. One study used a suspension of ACIs in gel composed of fibrinogen and thrombin, which was administered directly onto the chondral defect during surgery and allowed to harden [46]. The locations of the treated chondral defects were reported in 16 of the 17 included studies. Of these, all included patients with defects of the medial and/or lateral femoral condyles (MFC, LFC). Four also included patients with trochlear defects [45][46][47][48], and two included defects of the tibial plateau [46,49]. One study treated patellar defects, but reporting of individual patient data allowed for inclusion [45]. One study followed up patients for between 22 to 47 days [50]. The remaining 16 studies performed a follow-up of at least two years.
The outcomes of integration for each study are displayed in Table 3. Integration outcomes were recorded for radiographic, histological and arthroscopic data. Only two studies investigated the degree of integration using a biopsy [44,48], and another two performed second-look arthroscopy [47,51]. Other imaging outcomes are presented in Table  S2. Table S3 summarizes the performance of therapies in improving clinical scores and their surgical complications.

Characteristics of Selected Studies
Patient demographics, study designs and interventions are displayed in Table 2. Seven included studies were RCTs, and the rest were prospective case series. Two pairs of studies reported results of the same trial at different follow-up time points [38][39][40][41]. All studies investigated the performance of ACI. One of these was a randomized trial comparing MACI with synovium-derived MSCs [42]. Three studies combined the implantation of ACIs with a bone graft for the treatment of osteochondral defects [43][44][45]. One study used a suspension of ACIs in gel composed of fibrinogen and thrombin, which was administered directly onto the chondral defect during surgery and allowed to harden [46]. The locations of the treated chondral defects were reported in 16 of the 17 included studies. Of these, all included patients with defects of the medial and/or lateral femoral condyles (MFC, LFC). Four also included patients with trochlear defects [45][46][47][48], and two included defects of the tibial plateau [46,49]. One study treated patellar defects, but reporting of individual patient data allowed for inclusion [45]. One study followed up patients for between 22 to 47 days [50]. The remaining 16 studies performed a follow-up of at least two years.
The outcomes of integration for each study are displayed in Table 3. Integration outcomes were recorded for radiographic, histological and arthroscopic data. Only two studies investigated the degree of integration using a biopsy [44,48], and another two performed second-look arthroscopy [47,51]. Other imaging outcomes are presented in Table S2. Table S3 summarizes the performance of therapies in improving clinical scores and their surgical complications.   and longest follow-up (mean: 9.6 ± 0.9 years) 7 28.8 ± 9.

Magnetic Resonance Imaging
All studies, except one [47], used MRI to assess the repair cartilage formed from the implant. Of these, 11 studies used the MOCART scoring system for their evaluation. Three studies used an MRI composite score composed of the same parameters [38,39,53]. Marlovits et al. [50] and Selmi et al. [51] reported integration outcomes using their own scoring systems.

Arthroscopy
Two studies used arthroscopy to assess the quality of the repair following implantation. Selmi et al. used MACI in their prospective case series [51], obtaining a mean total ICRS score of 10 (range 5-12), which would indicate hyaline-like tissue. Out of 13 patients, 9 demonstrated either a completely integrated implant or a gap of less than 1 mm between the implant and the native cartilage. Two patients showed 75% integration of the peripheral margin of the implant and another two showed 50% integration of the margin.
Saris et al. performed an RCT comparing MACI with microfracture (MFX) [47]. They assessed the quality of the repair using the macroscopic ICRS II score. A total of 29.2% and 20.8% of patients demonstrated complete integration in the MACI and MFX groups, respectively. The difference in integration between the two groups was not statistically significant. The overall repair assessment revealed that 19.4% of the MACI group and 11.1% of the MFX group achieved a Grade I repair structure, which is defined as being "normal" cartilage.  Ebert et al. compared MACI and a traditional rehabilitation model (full weight-bearing at 11 weeks post-operatively) to MACI and accelerated rehabilitation (full weight-bearing at 8 weeks post-operatively) [38,39]. Both showed a significant improvement in their mean integration scores between 3 and 24 months post-operatively [38]. However, at 5-year follow-up, the integration scores had demonstrated a decline when compared to the scores at 24 months and no longer differed significantly from the first assessment at 3 months [39]. There was no significant difference between the two groups at 24 months, nor at 5 years of follow-up. Ebert et al. later performed a trial comparing a 6-week return to full weightbearing with an 8-week return [54]. The groups did not show a statistically significant improvement in their respective integration scores over time, but the mean score of the 6-week group (3.29 ± 0.25) was significantly better compared to that of the 8-week group (2.79 ± 0.23) at 24 months post-operatively.
Akgun et al. compared the use of collagen seeded with synovium-derived MSCs with MACI [42]. The mean integration score for each group improved significantly over the course of the trial. The MSC group had consistently higher integration scores, although the difference between the groups was not statistically significant at any point.
Selmi et al. did not use a scoring system but reported that the transition zone of repair tissue with the adjacent cartilage was smooth and regular in 13 out of 15 patients and that the repair tissue could not be distinguished from the native cartilage in 11 patients [51].

Arthroscopy
Two studies used arthroscopy to assess the quality of the repair following implantation. Selmi et al. used MACI in their prospective case series [51], obtaining a mean total ICRS score of 10 (range 5-12), which would indicate hyaline-like tissue. Out of 13 patients, 9 demonstrated either a completely integrated implant or a gap of less than 1 mm between the implant and the native cartilage. Two patients showed 75% integration of the peripheral margin of the implant and another two showed 50% integration of the margin. Saris et al. performed an RCT comparing MACI with microfracture (MFX) [47]. They assessed the quality of the repair using the macroscopic ICRS II score. A total of 29.2% and 20.8% of patients demonstrated complete integration in the MACI and MFX groups, respectively. The difference in integration between the two groups was not statistically significant. The overall repair assessment revealed that 19.4% of the MACI group and 11.1% of the MFX group achieved a Grade I repair structure, which is defined as being "normal" cartilage.

Histology
Two studies performed biopsies of the repair tissue post-operatively [44,48]. Bhattacharjee et al. performed a case series, using a bone graft combined with ACI as their intervention [44]. Ten biopsy specimens from eight patients were retrieved at different timepoints. The specimens were stained with hematoxylin and eosin to assess the morphology of the repair. Morphological assessment was performed using the ICRS II Histology Score, for which the maximum score attainable is 10 for each domain assessed. Integration was assessed for five patients. Two patients were biopsied twice, one at 11 and 24 months post-operatively and the other at 13 and 37 months post-operatively. They demonstrated increases in integration scores from 7.85 to 9.5 and 9.8 to 10, respectively. The mean integration score at final follow-up was 9.71. Of note, nine of the ten biopsy specimens showed fibrocartilage, and one demonstrated a mixture of fibrocartilage and hyaline cartilage.
Slynarski et al. investigated the use of a copolymer of polyethylene glycol terephthalate and polybutylene terephthalate combined with ACI and mononucleated cells [48]. Histological analyses of 31 osteochondral specimens taken at a variety of time-points were conducted, ranging from six months to 24 months post-operatively. Their assessment drew upon the ICRS II and O'Driscoll grading scales. For 58.6% of patients, there was no visible interface (i.e., there was no gap) between the repair and native tissues. Cartilage was classed as hyaline-like when positively staining for collagen type II, aggrecan and sulphated glycosaminoglycans (safranin O stain), negative for collagen type I and caused no birefringency of polarized light. 71% of the specimens demonstrated hyaline-like repair, 19.4% were positive for fibrocartilage and 9.7% were composed of fibrous tissue.

Other Imaging Outcomes
Fifteen studies assessed the quality of cartilage repair on MRI using parameters other than integration (Table S2). All of these reported the degree of filling of the chondral defect. Seven studies reported that a majority of patients achieved complete defect filling by final follow-up [41,43,45,46,[51][52][53]. Four studies reporting mean MOCART scores demonstrated an improvement in the mean filling score over time [38,42,49,54]. Whether or not the surface of the repair was intact was also assessed by 13 studies. Five demonstrated that at least half of the enrolled patients had an intact repair surface [45,46,48,52,53]. Six patient cohorts, across a further four studies, showed mixed results, with three cohorts showing improvement over time [38,42,54], and another three showing a declining score after 3 months of follow-up [42,49,54]. An analysis of the structure of the repair tissue is also reported in 14 studies. Seven studies demonstrated that only a minority of patients achieved a homogenous repair structure [40,41,[43][44][45]48,52]. Again, those studies reporting mean scores demonstrated varied results, with three patient cohorts showing improvement [38,42,49], one showing no change [42] and two showing a declining score after 3 months follow-up [38,54].
Regarding the MRI assessment of the subchondral lamina, four studies demonstrated that most patients had achieved an intact lamina at final follow-up [40,41,46,53]. Five showed the converse [43][44][45]48,52]. However, the studies reporting mean scores all demonstrated improvements in the MOCART subchondral lamina score [38,42,49,54]. The results from the assessment of the quality of the subchondral bone were also varied, with five studies demonstrating a majority with intact subchondral bone [40,45,48,51,53], and four reporting this in fewer than half of the participants [41,43,44,52]. Three studies reported an improvement in the MOCART subchondral bone score over time [42,49,54]. One demonstrated a statistically significant decline [38].
Akgun et al. performed a randomized control trial comparing MSC-seeded collagen scaffolds with MACI [42]. The MSC group performed better in all MOCART domains reported in this review, including the integration score. The difference in mean scores was statistically significant for the degree of defect filling and the surface of the implant.
Ebert et al. investigated the effect of traditional versus accelerated rehabilitation programs following MACI [38,39]. Five-year follow-up consistently demonstrated a reduction in the mean scores for each of the MOCART domains, including border integration, relative to those reported at 2 years.
Ebert and colleagues frequently measured the overall quality of the repair using an MRI or MOCART composite score [38,39,49,54]. This was derived by multiplying the score for each domain by a weighting factor and adding the scores together [49]. A significant improvement was found (standardized mean difference: 1.71; 95% CI (1.88, 3.22); p = 0.03), demonstrating a global improvement across the various domains of the MOCART score over the duration of follow-up (Figure 4).
Ebert et al. investigated the effect of traditional versus accelerated rehabilitation programs following MACI [38,39]. Five-year follow-up consistently demonstrated a reduction in the mean scores for each of the MOCART domains, including border integration, relative to those reported at 2 years.
Ebert and colleagues frequently measured the overall quality of the repair using an MRI or MOCART composite score [38,39,49,54]. This was derived by multiplying the score for each domain by a weighting factor and adding the scores together [49]. A significant improvement was found (standardized mean difference: 1.71; 95% CI (1.88, 3.22); p = 0.03), demonstrating a global improvement across the various domains of the MOCART score over the duration of follow-up (Figure 4).
Five studies investigated the improvement of pain symptoms using the Visual Analogue Scale (VAS) Pain Score, for which a negative score signifies a better outcome [42,45,49,53,54]. A significant improvement was observed at end point (SMD: −6.91; 95% CI (−9.92, −3.90); p = 0.001) ( Figure 5F). Improvements in overall patient health are reflected in the SF-36 Physical and Mental health scores. The SF-36 Physical scores improved in all five studies that reported it [41,45,49,53,54]

Subgroup Meta-Analyses
Subgroup meta-analyses were performed to further investigate the effect of using a collagen-based scaffold ( Table 4). Out of nine studies utilizing a collagen-based scaffold, eight used a collagen type I/III scaffold [38,39,42,47,49,50,53,54]. Another used a bilayer type I collagen sponge containing chondroitin sulfate [43].
Other scaffolds were composed of an agarose-alginate hydrogel [51], a combination of polyglactin 910 and poly-p-dioxanon [40,41], a benzylic ester of hyaluronic acid (HYAFF 11) [52,55] or a polyethylene glycol terephthalate and polybutylene terephthalate copolymer [48]. One study used a gel, suspending ACIs in fibrinogen and thrombin, which was placed directly on the defect and allowed to harden during the surgical procedure [46]. Two studies used bone grafts covered by either periosteum or a collagen membrane, with ACIs injected under the covering [44,45]. These were not included in the collagen subgroup, as the ACIs were not seeded into the collagenous membrane itself before implantation and because the overall study results included the periosteal flaps.
The improvements in clinical scores were all superior in those using collagen scaffolds. This difference was statistically significant for the IKDC (p = 0.005), KOOS Pain score (p = 0.006), TAS (p = 0.009), SF-36 Physical score (p = 0.0004) and SF-36 Mental score (p < 0.0001). The proportion of patients with graft failures was also reduced for collagen scaffolds.

Subgroup Meta-Analyses
Subgroup meta-analyses were performed to further investigate the effect of using a collagen-based scaffold ( Table 4). Out of nine studies utilizing a collagen-based scaffold, eight used a collagen type I/III scaffold [38,39,42,47,49,50,53,54]. Another used a bilayer type I collagen sponge containing chondroitin sulfate [43].
Other scaffolds were composed of an agarose-alginate hydrogel [51], a combination of polyglactin 910 and poly-p-dioxanon [40,41], a benzylic ester of hyaluronic acid (HYAFF 11) [52,55] or a polyethylene glycol terephthalate and polybutylene terephthalate copolymer [48]. One study used a gel, suspending ACIs in fibrinogen and thrombin, which was placed directly on the defect and allowed to harden during the surgical procedure [46]. Two studies used bone grafts covered by either periosteum or a collagen membrane, with ACIs injected under the covering [44,45]. These were not included in the collagen subgroup, as the ACIs were not seeded into the collagenous membrane itself before implantation and because the overall study results included the periosteal flaps.
The improvements in clinical scores were all superior in those using collagen scaffolds. This difference was statistically significant for the IKDC (p = 0.005), KOOS Pain score (p = 0.006), TAS (p = 0.009), SF-36 Physical score (p = 0.0004) and SF-36 Mental score (p < 0.0001). The proportion of patients with graft failures was also reduced for collagen scaffolds.

Risk of Bias Assessment
Overall, there were some concerns with the risk of bias in the randomized studies ( Figure 7A). This was primarily caused by potential bias in the randomization process, because most studies did not conceal the allocation of randomized patients to each intervention arm.
For non-randomized studies, the overall risk of bias was moderate ( Figure 7B). The most frequent source of potential bias was in the measurement of the outcome. This was because assessors were rarely blinded to the purposes of the intervention administered.

Discussion
In this review, we investigated the degree of integration between repair and native cartilage after using current ACI or MSC-seeded implants for focal chondral defects of the tibiofemoral joint. We selected 17 studies that contained quantitative information regarding the degree of integration, either as a score or as the proportion of patients achieving complete integration, as determined by MRI, arthroscopy or histology. Our primary findings show that ACI is associated with good quality integration. The results for clinical outcomes, including function and pain, also demonstrated improvements after the use of ACI. Although the limited evidence available suggests that MSCs can achieve improvements in integration and clinical outcomes, broad conclusions could not be drawn due to the relative lack of studies treating focal chondral defects with MSC implants. This is a common finding of other reviews, which remark that clinical studies investigating MSCs are few in number and involve small patient cohorts [56][57][58]. These and other limitations, including limited follow-up and the heterogeneity of data, are evaluated in this discussion.

MRI as an Investigative Technique for Integration
The results of the meta-analyses for integration were encouraging, with 64% of 200 patients achieving complete integration (95% CI (51%, 75%)). Only in two of the ten comparable papers did a minority of patients undergoing ACI achieve complete integration [40,41]. These were studies investigating the same group of patients and interventions at different timepoints.
MRI was by far the most popular choice of investigation, with almost all (16/17) using it to evaluate repair cartilage morphology. This is likely because MRI is widely available, non-invasive and validated as a technique for assessing repair cartilage [26]. Scanners of similar specifications (high-resolution MRI at least 1.0 to 3.0 T) and sequencing were used across the studies. All looked at the tibiofemoral joint from both coronal and sagittal planes, with the fast-spin echo (dual T2-FSE) and fat-suppressed gradient echo sequences (3D-GE-FS), in accordance with the approach used by Marlovits et al. [50]. Although data was available for most patients, there were concerns over the potential risk of bias due to missing outcome data for both RCTs and non-randomized studies. Most RCTs reported using at least one radiologist, blinded to the patients' clinical details and to the procedure. Blinding to the procedure usually did not occur in the non-randomized case series, possibly introducing reporting bias.
The studies did not report integration outcomes consistently, limiting the amount of comparable data available. Ten studies reported the proportion of patients achieving complete integration [31,40,41,[43][44][45][46]48,52,53], and five reported the mean border zone integration score [38,39,42,49,54]. As a result, not all quantitative results were comparable, limiting the number of patients included in each analysis. Consistent reporting of the integration score would allow for comparison between a greater number of studies and increase the power of any subsequent meta-analysis.
Our review highlighted the importance for further studies to record integration specifically. Using an overall or composite MOCART score without reference to individual domains did not allow for interpretation of the degree of integration in many of the studies eligible for full-text screening. The evidence already suggests that MOCART scores might not correlate well with patient characteristics and surgical outcomes [30]. Therefore, if we are to understand how integration relates to improved clinical outcomes, future studies will have to consider integration independently.
The significant statistical heterogeneity (I 2 = 90%) revealed by our analysis of the comparable MOCART border integration scores means that a cautious interpretation of the statistically significant improvement (SMD: 1.16; 95% CI (0.07, 2.24; p = 0.04) should be made. Variance in the pooled results of multiple studies is often due to a random sampling error or clinical heterogeneity [59]. The patient baseline characteristics were similar for the studies included in the meta-analysis, so it is unlikely that the errors arose from non-random sampling. Clinical heterogeneity, on the other hand, is more likely to have contributed to this. This might be the result of factors such as (1) differences in treatment, (2) differences in study design or (3) differences in data analysis methods [59]. In our meta-analysis, the four studies performed by Ebert et al. used a similar intervention, MACI, but involved different rehabilitation protocols and follow-up times. The statistical heterogeneity observed here may be partially attributed to these differences. Furthermore, Akgun et al. used MACI in one patient cohort and MSC-seeded collagen scaffolds in the other [42]. This difference in interventions may have contributed to statistical heterogeneity, possibly making the pooled effect size unreliable.
We focused on integration because of its perceived clinical relevance to the durability of a repair. However, it is important to recognize that there are additional parameters used to assess for the quality of cartilage repair. Interestingly, in two papers investigating the same patients over time, a minority achieved complete integration [40,41], but "defect filling" was achieved in most patients at end point. For MACI and ACI-P, the percentage of patients achieving complete defect filling was 50.0% and 11.1%, respectively, at 24 months. These values had increased to 55.5% and 71.4% by the time of final follow-up, which was performed at 9.6 ± 0.9 years (MACI) or 8.6 ± 0.8 years (ACI-P). One may interpret these results as a long-lasting success in the repair of a cartilage defect, even though integration did not improve. As we have done for integration, defect fill and other MOCART domains warrant formal investigation.

Molecular Analysis for Optimal Scaffold and Source of Cells
A major factor contributing to clinical heterogeneity was variation in the making of implants, primarily regarding cell retrieval and culture and the choice of scaffold composition. Not every study reported the location from which cells were retrieved. Eleven studies mentioned non-weightbearing zones as the source, and fewer mentioned specifically that the lateral or medial femoral condyle or intercondylar notch was the location biopsied. The medium with which cells were cultured was often omitted, but, in those that did mention it, media included serum of the patients' own blood and Ham's F12 containing 10% fetal calf serum [42]. Culture duration ranged from 3 days [40] to 8 weeks [39]. Many studies did not report the cell density in scaffolds, but, among those that did, there was a wide range from 2 to 30 million cells/scaffold.
The ACI technique, including the scaffold type, varied among the studies. MACI was the most common. Two studies had used ACI-P, the more traditional and now less clinically relevant technique, albeit as a control [40,41], and a single study used gel-type ACI (GACI) [46]. The most common scaffold used was a type I/III collagen scaffold. Others included an agarose-alginate hydrogel [51], a bilayer type I collagen sponge containing chondroitin sulfate (Novocart 3D) [43], a benzylic ester of hyaluronic acid (Hyalograft C) [52,55], a polyglactin 910 combined with poly-p-dioxanon [40,41] and a polyethylene glycol terephthalate and polybutylene terephthalate copolymer [48]. As demonstrated by our subgroup analysis, collagen scaffolds were associated with improved integration, clinical outcomes and a lower graft failure rate. It is difficult to find further evidence to corroborate this in the current literature. One systematic review found only weak evidence of superiority of MACI relative to ACI-P [60]. A prospective series comparing failure rates of different ACI techniques found that altered polymer combinations with collagen performed differently [61]. Their ACI-seeded fibrin-collagen patch demonstrated fewer failures than the collagen-hydroxyapatite scaffold and alginate-agarose hydrogel scaffold.
Variations in the therapeutic process are well documented in the literature, with other reviews sharing the same observation. In a systematic review of various ACI studies, Migliorini et al. suggested that a lack of consensus on what is the best combination of cell type and method for producing implants, as well as continuous innovation and novel scaffold types, have contributed to a wide variety of available ACI therapies [62]. Establishing the most effective combination of these factors will be essential if outcomes after ACI therapy are to be optimized. This applies not only to integration but also to other morphological and clinical outcomes.
Evaluation of the molecular profile of cells has already been used to confirm the presence of MSCs or autologous chondrocytes before seeding them into scaffolds. Akgun et al. used flow cytometry or PCR to test for the expression, by chondrocytes, of CD44 and CD73 and the lack of CD45, as a quality assessment, since this expression profile is associated with a better differentiation capacity [42]. Research has shown that other markers might also be relevant, such as S-100, aggrecan TGF-β, glucocorticoid receptor alpha and the vitamin D3 receptor, which are associated with low rates of apoptosis [63]. With no consensus on what cell type to use, investigating the molecular profile of autologous chondrocytes may be a useful screening tool to determine which profile improves integration. This characterization is likely to be even more important for MSCs, given the wider range of tissues from which MSCs can be cultivated, including adipose, peripheral blood and bone marrow. In fact, Park et al. found that a substantial number of errors have been made in labelling the therapeutic MSCs used for cartilage repair [64], making an objective screening method such as the molecular profile even more pertinent.
MSCs remain a promising yet under-investigated therapeutic option for patients with focal chondral defects. Migliorini et al. showed in their review that MSCs resulted in significant improvements post-operatively, including in clinical scores such as the KOOS [62]. Another benefit of using MSCs is that it avoids the necessity of primary arthroscopic cartilage harvesting for ACI. Akgun et al. found that, for some assessments, the use of MSCs outperformed the MACI technique [42]. Given these positive but limited results, further investigation is certainly warranted for the use of MSCs in treating chondral defects, including the need to optimize MSC implant integration.
Currently, widespread uptake of MSC implant therapies in clinical practice has not yet taken place, with most data being derived from pre-clinical studies [57,58]. As well as the relatively small number of human studies, there are limited follow-up data available in comparison to ACI [57,58,62]. This not only limits our knowledge of the long-term efficacy of MSC implants but also has implications for potential safety issues. The possibility of tumors developing from implanted MSCs and concerns over differentiation into unwanted tissue types have been raised [57]. However, these have not yet been realized clinically in the investigation of cartilage repair. A greater number of human studies with continuous follow-up would provide valuable long-term data regarding the safety and efficacy of this therapy.
There are also several limitations to our current understanding, of which MSC delivery protocol is the most effective. This is partially due to heterogeneity in the therapeutic compositions used to date, recognized by others as an obstacle to collating data in metaanalyses [56]. Uncertainty regarding the best cell source, cell dosage, the presence of growth factors and rehabilitation protocols all contribute to heterogeneity [57,58]. Investigated techniques reported in the literature demonstrate wide variability in each of these domains [20], limiting the amount of comparison which can be made between studies. This is compounded by the small number of randomized comparative studies in humans, with case series of few patients predominantly making up the evidence base [56,57]. A greater number of randomized, double-blinded control trials to investigate the efficacy of MSC implants relative to established therapies, such as MACI, would contribute towards less biased evidence. Knowledge of the best treatment protocol would also be furthered by clinical studies comparing different MSC therapies, with the aim of eliciting the best cell source, dose and supplementary growth factors [57]. Advancements in each of these domains have led to agreement on a technique, MACI, which is now widely used for chondrocyte implantation. This has, to some extent, mitigated the heterogeneity between studies that had previously challenged teams implementing ACI [57,65]. The hope is that the same could be achieved for MSC implant therapies to homogenize treatment protocols and make data more comparable.

Role of Arthroscopy and Histology
Selmi et al. found that a majority of patients achieved complete integration, based on the macroscopic ICRS scoring, while Saris found that only a minority of patients achieved this after MACI. With only two studies available, a meta-analysis could not be performed. While Saris et al. assessed many subjects arthroscopically (60 patients), Selmi et al. assessed only a small cohort of 13 subjects. This appears to be a common theme among studies and is likely due to patients disliking the invasive nature of arthroscopy compared to MRI and the ethical issue of subjecting a patient who is already satisfied with their improved knee symptoms to another procedure [47,51].
Biopsy remains the "gold-standard" for determining integration of the repair cartilage, since MRI is less accurate and results in more inter-observer variability [66]. Two studies demonstrated good quality integration on a histological assessment [44,48]. Furthermore, biopsy allows for the determination of the repair cartilage phenotype, which varied in the included studies. Hyaline cartilage, like that of the native knee cartilage, is more desirable than fibrocartilage, which demonstrates poorer mechanical properties [48]. Biopsies at the border between implant and native tissue would certainly give a better idea of integration, demonstrated by the homogeneity of hyaline cartilage across the implant-native cartilage interface. Immunohistochemical analysis would be a sensitive method for detecting this homogeneity, as well as the cartilage phenotype, by indicating a tissue that is either made predominantly of type II collagen (hyaline cartilage) or type I collagen and IIA procollagen (fibrocartilage) [67].

Long-Term Outcomes
Sixteen selected studies included follow-up to at least 24 months, but only eight performed additional measurements up to five years, and three studies included followup beyond that. This meant that it was difficult to make conclusions about long-term outcomes. In our review, graft failure was seen in a small proportion of patients, but, without long-term measurements, it is impossible to determine whether this changes over time. This seems to be common in the current literature. Mistry et al. describe how the lack of long-term follow-up and the simultaneous quick evolution of ACI means that long-term data is for outdated techniques [16]. Understanding how integration and other outcomes change in the long-term is essential to providing a current indication of performance to be improved upon and determining which additional complications might occur, such as the need for re-operation. Prioritizing this investigation seems reasonable, as some studies have already demonstrated deteriorating outcomes over time. For example, Ebert and colleagues found that integration, and other MOCART outcomes, were poorer at five years compared to 24 months post-operatively [38,39]. Ochs et al. found that the Lysholm and IKDC scores either plateaued or decreased from 36 to 48 months [43]. Given that the nature of change in both imaging and clinical outcomes over time is not fully understood, we suggest that more authors follow-up their patients in the longer-term to elucidate this.

Strengths and Limitations
Our review possesses several strengths that enable us to make meaningful conclusions. The search strategy was extensive, including any study that investigated integration, and has allowed us to thoroughly extract data regarding multiple outcome measures, including MRI, arthroscopic and histological assessments, as well as clinical data. Robust inclusion and exclusion criteria allowed for a valid comparison of included studies. By extracting quantitative data, we were able to conduct meta-analyses to give more precise interpretations across the pooled studies.
However, there are limitations to the included studies. They show a high degree of heterogeneity in the scaffold composition and the length of follow-up. Ten of the seventeen studies are non-blinded, non-randomized case series, which have demonstrated a moderate risk of bias, particularly due to missing data and in the measurement of outcomes related to repair cartilage morphology. Common across many papers was a small sample size, meaning they might have been underpowered.
The largest limitation was that few analyses could be performed for MSC implants, as only one paper investigating MSCs met the inclusion criteria. Common reasons for the exclusion of MSC studies were the enrolment of patients with diffuse osteoarthritis, the use of intra-articular injections rather than implants and indistinguishable reporting of the results of treating patellofemoral and tibiofemoral joints. The inclusion of these papers would have resulted in an invalid comparison of patients with different defect types, cell-delivery methods and defect locations, respectively.

Conclusions
This systematic review and meta-analysis has collated integration outcomes following the treatment of focal chondral defects of the tibiofemoral joint with cell-seeded scaffolds. Though there were insufficient papers to make generalizations about MSC therapies, the studies we have selected suggest that the degree of integration between chondrocyte-seeded scaffolds and native cartilage appears to be of good quality in most patients. This is associated with simultaneous improvements in clinical scores. More evidence for the integration of MSC-based implants is awaited, but the current results are encouraging and suggest that MSCs may be superior to ACI in achieving an integrated repair structure. There is heterogeneity in cell sources, scaffolds and the processing of cultured cells between studies. We suggest that molecular methods, such as the cluster of differentiation (CD) characterization, should be used to screen for the quality of implanted cells and that the widespread use of collagen scaffolds should be adopted. In addition, more consistent recording of integration would allow for greater comparison between studies. By conducting this review, we hope to have established a baseline standard to which further investigations can be compared to optimize integrative repair and tackle the significant consequences of chondral defects.